This project is a proof of concept for the Atlanta Regional Commission that demonstrates the capability of convolutional neural networks to accurately classify aerial imagery of the Atlanta area into a specific land cover class. This particular segment of the project shows a baseline level of accuracy to be expected with five different image classifications: High density residential, medium density residential, low density residential, forest, and commercial. Additional model tuning and imagery analysis should be able to further increase overall accuracy.
The imagery for this part of the project is from 2010. The idea is to generate a robust CNN (Convolutional Neural Network) based on that imagery, and when new data becomes available, the model can be tweaked and improved with higher quality imaging technology.
require(tidyverse)
require(keras)
require(caret)
require(pROC)
require(imager)
require(sqldf)
require(randomForest)
The neural network structure requires that images be separated into folders based on their classification.
set.seed(314159)
filepath <- "F:\\LandPro_2010_Imagery\\fulton_2010_jpg"
load("F:\\LandPro_2010_Imagery\\Preliminary Work\\02282019image.h5")
model <- load_model_hdf5("F:\\LandPro_2010_Imagery\\Preliminary Work\\02082019cnn1.h5")
files <- list.files(path = filepath)
cat111images <- files[grep(".*_111.jpg", files)]
cat112images <- files[grep(".*_112.jpg", files)]
cat113images <- files[grep(".*_113.jpg", files)]
cat12images <- files[grep(".*_12.jpg", files)]
cat40images <- files[grep(".*_40.jpg", files)]
dir.create(paste0(filepath, "\\train", sep = ""))
dir.create(paste0(filepath, "\\validation", sep = ""))
dir.create(paste0(filepath, "\\test", sep = ""))
dir.create(paste0(filepath, "\\train\\111", sep = ""))
dir.create(paste0(filepath, "\\validation\\111", sep = ""))
dir.create(paste0(filepath, "\\test\\111", sep = ""))
dir.create(paste0(filepath, "\\train\\112", sep = ""))
dir.create(paste0(filepath, "\\validation\\112", sep = ""))
dir.create(paste0(filepath, "\\test\\112", sep = ""))
dir.create(paste0(filepath, "\\train\\113", sep = ""))
dir.create(paste0(filepath, "\\validation\\113", sep = ""))
dir.create(paste0(filepath, "\\test\\113", sep = ""))
dir.create(paste0(filepath, "\\train\\12", sep = ""))
dir.create(paste0(filepath, "\\validation\\12", sep = ""))
dir.create(paste0(filepath, "\\test\\12", sep = ""))
dir.create(paste0(filepath, "\\train\\40", sep = ""))
dir.create(paste0(filepath, "\\validation\\40", sep = ""))
dir.create(paste0(filepath, "\\test\\40", sep = ""))
val111 <- sample(1:length(cat111images), round(0.3*length(cat111images)))
test111 <- setdiff(sample(1:length(cat111images), round(0.3*length(cat111images))), val111)
train111 <- setdiff(1:length(cat111images), union(val111, test111))
val112 <- sample(1:length(cat112images), round(0.3*length(cat112images)))
test112 <- setdiff(sample(1:length(cat112images), round(0.3*length(cat112images))), val112)
train112 <- setdiff(1:length(cat112images), union(val112, test112))
val113 <- sample(1:length(cat113images), round(0.3*length(cat113images)))
test113 <- setdiff(sample(1:length(cat113images), round(0.3*length(cat113images))), val113)
train113 <- setdiff(1:length(cat113images), union(val113, test113))
val12 <- sample(1:length(cat12images), round(0.3*length(cat12images)))
test12 <- setdiff(sample(1:length(cat12images), round(0.3*length(cat12images))), val12)
train12 <- setdiff(1:length(cat12images), union(val12, test12))
val40 <- sample(1:length(cat40images), round(0.3*length(cat40images)))
test40 <- setdiff(sample(1:length(cat40images), round(0.3*length(cat40images))), val40)
train40 <- setdiff(1:length(cat40images), union(val40, test40))
file.copy(file.path(filepath, cat111images[test111]), file.path(paste0(filepath, "\\test\\111\\", sep = "")))
file.copy(file.path(filepath, cat111images[val111]), file.path(paste0(filepath, "\\validation\\111\\", sep = "")))
file.copy(file.path(filepath, cat111images[train111]), file.path(paste0(filepath, "\\train\\111\\", sep = "")))
file.copy(file.path(filepath, cat112images[test112]), file.path(paste0(filepath, "\\test\\112\\", sep = "")))
file.copy(file.path(filepath, cat112images[val112]), file.path(paste0(filepath, "\\validation\\112\\", sep = "")))
file.copy(file.path(filepath, cat112images[train112]), file.path(paste0(filepath, "\\train\\112\\", sep = "")))
file.copy(file.path(filepath, cat113images[test113]), file.path(paste0(filepath, "\\test\\113\\", sep = "")))
file.copy(file.path(filepath, cat113images[val113]), file.path(paste0(filepath, "\\validation\\113\\", sep = "")))
file.copy(file.path(filepath, cat113images[train113]), file.path(paste0(filepath, "\\train\\113\\", sep = "")))
file.copy(file.path(filepath, cat12images[test12]), file.path(paste0(filepath, "\\test\\12\\", sep = "")))
file.copy(file.path(filepath, cat12images[val12]), file.path(paste0(filepath, "\\validation\\12\\", sep = "")))
file.copy(file.path(filepath, cat12images[train12]), file.path(paste0(filepath, "\\train\\12\\", sep = "")))
file.copy(file.path(filepath, cat40images[test40]), file.path(paste0(filepath, "\\test\\40\\", sep = "")))
file.copy(file.path(filepath, cat40images[val40]), file.path(paste0(filepath, "\\validation\\40\\", sep = "")))
file.copy(file.path(filepath, cat40images[train40]), file.path(paste0(filepath, "\\train\\40\\", sep = "")))
Designating file paths for Training and Validation images for use in the creation of the neural network below.
train_dir <- file.path(paste(filepath, "\\train", sep = ""))
validation_dir <- file.path(paste(filepath, "\\validation", sep = ""))
CNNs learn patterns by taking a small sample of images (referred to as a batch) and learning patterns from those sample images. It then moves on to the next batch of images and learns patterns from that sample, and so on until it completes a set number of learning cycles. The batch size will be defined below, but at this point, it is useful to determine the total number of images available. That helps to determine the batch size and the number of cycles needed to accurately process the images.
numtrainingfiles <- length(list.files(paste(filepath, "\\train\\111", sep = ""))) +
length(list.files(paste(filepath, "\\train\\112", sep = ""))) +
length(list.files(paste(filepath, "\\train\\113", sep = ""))) +
length(list.files(paste(filepath, "\\train\\12", sep = ""))) +
length(list.files(paste(filepath, "\\train\\40", sep = "")))
CNNs take vectorized image pixel values (3 in this case, that correspond to the 3 RGB values) and perform basic arithmetic operations on them to generate a “stack” of matrices that contain average pixel values for small pieces of each image at a time. It then combines these stacks into new smaller layers, and repeats the process until it gets to a point where there is only a 1-dimensional array of sums and averages that will then get filtered down into a series of probabilities that correspond to the model’s interpretation of what category each image belongs to.
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu",
input_shape = c(192, 192, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 128, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_flatten() %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 1024, activation = "relu") %>%
layer_dense(units = 5, activation = "softmax")
Setting the loss metric, optimizer and learning rate of the CNN.
Ultimately, we want the model’s effectiveness to be determined by how accurate it is. While this might seem obvious, this is something that needs to be specified for the network to work with. As noted above, the network will learn what it can from small image samples and then move on to the next group. However, with data as complex as imagery, we don’t want the model to spend too much or too little time learning on each group of images. If it tries to learn too slowly, then the amount of processing time it takes to over thousands of pixels in thousands of images becomes prohibitively large. If it tries to learn too quickly, then it likely will not pick up on all of the nuances that help distinguish one type of image from another and accuracy will suffer. We have to manually set the learning rate so that processing time and accuracy are balanced.
model %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_rmsprop(lr = 0.0001),
metrics = c("acc")
)
In order to capture patterns in images that may not be normalized, the image generator will take the image samples and perform a number of basic manipulations on them before training the network. These manipulations include image rotation, horizontal flips, vertical flips, and zooms. The reason behind this is that patterns in aerial imagery are not often found in the same orientation from one image to the next. For example, the letter “P” is only recognizable when it is written in a certain way. When someone flips the letter “P” vertically, it is no longer a “P” and becomes a “b”. Patterns that are easily identified by humans for aerial imagery purposes aren’t subject to that constraint. A large industrial complex is still a large industrial complex no matter how you orient it. The training generator helps the network to “learn” this.
train_datagen <- image_data_generator(
rescale = 1/255,
horizontal_flip = TRUE,
vertical_flip = TRUE,
zoom_range = .2,
rotation_range = 15,
fill_mode = "reflect"
)
There is no need to perform image manipulation on the validation images since these are not directly part of the “learning” process.
validation_datagen <- image_data_generator(rescale = 1/255)
The CNN will process images in groups of 15 in order to learn identification patterns.
batchsize <- 15
Each pixel for every image has 3 values (corresponding to the RGB values the pixel displays), and each image can have hundreds of thousands or millions of pixels in them. Because of the mathematics involved, CNN processing time increases exponentially with the number of input values and having millions of pixel values is simply not feasible to work with from a computational standpoint. As such, it is very important to scale the images down in size (in this case, to 192x192) so that the network can train itself in a reasonable amount of time. Of course, it’s harder for humans to distinguish features in smaller images and the same is true of CNNs when it comes to identifying patterns in images. Again, there’s a balance to be struck between prohibitively large amounts of processing time and overall accuracy.
train_generator <- flow_images_from_directory(
train_dir,
train_datagen,
target_size = c(192, 192),
batch_size = batchsize,
class_mode = "categorical"
)
validation_generator <- flow_images_from_directory(
validation_dir,
validation_datagen,
target_size = c(192, 192),
batch_size = batchsize,
class_mode = "categorical"
)
batch <- generator_next(train_generator)
Number of epochs has been set to 40 and the number of steps is set to the total number of training images divided by the batch size (both set above). Prior testing has indicated that accuracy is not significantly improved by increasing the number of epochs much past 40.
history <- model %>% fit_generator(
train_generator,
steps_per_epoch = round(numtrainingfiles/batchsize),
epochs = 40,
validation_data = validation_generator,
validation_steps = round(numtrainingfiles/batchsize)
)
Note the training accuracy of 80% and the validation accuracy of 78%, indicating good model generalization and a lack of undesirable overfitting.
history
## Trained on NULL samples (batch_size=NULL, epochs=50)
## Final epoch (plot to see history):
## val_loss: 0.5685
## val_acc: 0.7874
## loss: 0.4741
## acc: 0.8029
test_dir <- "F:\\LandPro_2010_Imagery\\fulton_2010_jpg\\test"
test_datagen <- image_data_generator(
rescale = 1/255
)
test_generator <- flow_images_from_directory(
test_dir,
test_datagen,
color_mode = "rgb",
target_size = c(192,192),
batch_size = 16,
class_mode = "categorical",
shuffle = FALSE
)
model %>% evaluate_generator(test_generator, steps = 340)
This outputs 5 separate values, each one equal to the probability that an image belongs to each of the 5 separate land use categories as calculated by the model.
preds <- predict_generator(model,
test_generator,
steps = 340)
predictions <- data.frame(test_generator$filenames)
predictions <- cbind(predictions, preds)
names(predictions) <- c("filename", "cat111prob", "cat112prob", "cat113prob", "cat12prob", "cat40prob")
predictions$classprediction <- sapply(1:nrow(predictions), function(x) min(which(predictions[x,2:6] == max(predictions[x, 2:6]))))
predictions$classprediction <- case_when(predictions$classprediction == 1 ~ "111",
predictions$classprediction == 2 ~ "112",
predictions$classprediction == 3 ~ "113",
predictions$classprediction == 4 ~ "12",
predictions$classprediction == 5 ~ "40"
)
predictions$classactual <- case_when(grepl("111", predictions$filename) == TRUE ~ "111",
grepl("112", predictions$filename) == TRUE ~ "112",
grepl("113", predictions$filename) == TRUE ~ "113",
grepl("_12", predictions$filename) == TRUE ~ "12",
grepl("_40", predictions$filename) == TRUE ~ "40"
)
predictions$classactual <- as.factor(predictions$classactual)
predictions$classprediction <- as.factor(predictions$classprediction)
confusionMatrix(predictions$classprediction, predictions$classactual)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 111 112 113 12 40
## 111 679 257 11 2 0
## 112 491 1894 308 39 0
## 113 1 9 71 1 0
## 12 1 2 4 204 5
## 40 10 2 9 11 1422
##
## Overall Statistics
##
## Accuracy : 0.7859
## 95% CI : (0.7748, 0.7968)
## No Information Rate : 0.3983
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6891
## Mcnemar's Test P-Value : < 2.2e-16
##
## Statistics by Class:
##
## Class: 111 Class: 112 Class: 113 Class: 12 Class: 40
## Sensitivity 0.5745 0.8752 0.17618 0.79377 0.9965
## Specificity 0.9365 0.7437 0.99781 0.99768 0.9920
## Pos Pred Value 0.7155 0.6933 0.86585 0.94444 0.9780
## Neg Pred Value 0.8878 0.9000 0.93796 0.98984 0.9987
## Prevalence 0.2176 0.3983 0.07418 0.04730 0.2627
## Detection Rate 0.1250 0.3486 0.01307 0.03755 0.2617
## Detection Prevalence 0.1747 0.5029 0.01509 0.03976 0.2676
## Balanced Accuracy 0.7555 0.8094 0.58700 0.89573 0.9943
The confusion matrix shows that the model is generally sound when it comes to distinguishing between various levels of residential densities (classes 111, 112, and 113). It is promising to see very few gross errors such as “high density residential” (category 113) being commonly mistaken for “low density residential” (category 111).
Note the table below which shows the tally of incorrect classifications for each category. The network generates far fewer misclassifications for certain categories (such as “forest”, category 40) than others. This is to be expected, as residential densities aren’t binary designations. For residential classifications, the model not only has to determine ‘what’ features are present/not present, but it also has to determine ‘how many’.
wrongPredictions <- predictions[predictions$classprediction != predictions$classactual,]
table(wrongPredictions$classactual, wrongPredictions$classprediction)
##
## 111 112 113 12 40
## 111 0 491 1 1 10
## 112 257 0 9 2 2
## 113 11 308 0 4 9
## 12 2 39 1 0 11
## 40 0 0 0 5 0
Of course, it’s still very important to look at the mistaken images to see if the model’s incorrect predictions still make sense to the human eye.
For reference:
wrongPredictions <- wrongPredictions[sample(1:nrow(wrongPredictions), 25),]
plotImage <- function(imagenum) {
myImage <- load.image(paste(test_dir, wrongPredictions$filename[imagenum], sep = "\\"))
myImage <- resize(myImage, 500, 500)
plot(myImage, axes = FALSE)
legend(0.5, 0, bty = "n", text.font = 2, text.col = "white",
legend = c(paste("Predicted class: ", wrongPredictions$classprediction[imagenum], sep = ""),
paste("Actual class: ", wrongPredictions$classactual[imagenum], sep = ""),
paste("Probability 111: ", round(wrongPredictions$cat111prob[imagenum], 3), sep = ""),
paste("Probability 112: ", round(wrongPredictions$cat112prob[imagenum], 3), sep = ""),
paste("Probability 113: ", round(wrongPredictions$cat113prob[imagenum], 3), sep = ""),
paste("Probability 12: ", round(wrongPredictions$cat12prob[imagenum], 3), sep = ""),
paste("Probability 40: ", round(wrongPredictions$cat40prob[imagenum], 3), sep = ""))
)
}
sapply(1:25, plotImage)